Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
نویسندگان
چکیده
Comprehension of spoken natural language is an essential skill for robots to communicate with humans effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures and the wide variety of expressions used in spoken language, and (2) inherent ambiguity of human instructions. In this paper, we propose the first comprehensive system for controlling robots with unconstrained spoken language, which is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and show how higher success rates of the object picking task can be achieved through an interactive clarification process. 1
منابع مشابه
Robust Natural Language Dialogues for Instruction Tasks
Being able to understand and carry out spoken natural instructions even in limited domains is extremely challenging for current robots. The difficulties are multifarious, ranging from problems with speech recognizers to difficulties with parsing disfluent speech or resolving references based on perceptual or task-based knowledge. In this paper, we present our efforts at starting to address thes...
متن کاملThe gender congruency effect during bilingual spoken-word recognition.
We investigate the 'gender-congruency' effect during a spoken-word recognition task using the visual world paradigm. Eye movements of Italian-Spanish bilinguals and Spanish monolinguals were monitored while they viewed a pair of objects on a computer screen. Participants listened to instructions in Spanish (encuentra la bufanda / 'find the scarf') and clicked on the object named in the instruct...
متن کاملIntegration of visual and linguistic information in spoken language comprehension.
Psycholinguists have commonly assumed that as a spoken linguistic message unfolds over time, it is initially structured by a syntactic processing module that is encapsulated from information provided by other perceptual and cognitive systems. To test the effects of relevant visual context on the rapid mental processes that accompany spoken language comprehension, eye movements were recorded wit...
متن کاملEye movements and spoken language comprehension: effects of visual context on syntactic ambiguity resolution.
When participants follow spoken instructions to pick up and move objects in a visual workspace, their eye movements to the objects are closely time-locked to referential expressions in the instructions. Two experiments used this methodology to investigate the processing of the temporary ambiguities that arise because spoken language unfolds over time. Experiment 1 examined the processing of sen...
متن کاملFlexible Use of Phonological and Visual Memory in Language-mediated Visual Search
In language-mediated visual search, memory and attentional resources must be allocated to simultaneously process verbal instructions while navigating a visual scene to locate linguistically specified targets. We investigate when and how listeners use object names in visual-search strategies across three visual world experiments, varying the presence and location of an added visual memory demand...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.06280 شماره
صفحات -
تاریخ انتشار 2017